home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
C/C++ Users Group Library 1996 July
/
C-C++ Users Group Library July 1996.iso
/
vol_300
/
370_01
/
gat_ref.asc
< prev
next >
Wrap
Text File
|
1992-10-27
|
118KB
|
3,265 lines
GA Toolkit
Version 1.0
Sara A. Lienau
University of Missouri-Kansas City
May 20, 1992
TABLE OF CONTENTS
{toc \f|1. INTRODUCTION 1
Overview of Genetic Algorithms 1
The GA Toolkit 3
Overview of Documentation 4
2. INSTALLATION 5
3. USER INTERFACE PROGRAM 8
Selecting GA Components 8
Description of Available Components 9
Selecting the Problem to Solve 12
Description of Available Problems 12
Creating an Executable GA Program 14
Creating Input Files 15
Creating the "in" File 16
Description of Input Parameters 17
Creating the "genes" File 20
Defining a New Problem 24
4. EXECUTING YOUR GA PROGRAM 29
Description of Output Files 30
5. PROGRAMMER'S REFERENCE 33
Description of Classes 33
Description of #define Directives 36
Adding a New GA Component 38
Adding a New Parameter 50
REFERENCES 56
}
CHAPTER 1
{tc "1. INTRODUCTION"}Introduction
This document describes a software tool for researching and applying Genetic
Algorithms (GAs), called the GA Toolkit. GAs are powerful search and
optimization techniques based on the principles of natural selection and
population genetics. This chapter provides a brief overview of Genetic
Algorithms, describes the GA Toolkit and present an overview of the remainder
of this document.
{tc "Overview of Genetic Algorithms" \l 2}Overview of Genetic Algorithms
GAs manipulate a population of sample points from the search space. Each
sample point or individual in the population has several
components--chromosome, phenotype and fitness value. The chromosome is a coded
representation for a particular point in the search space. In biological
systems, the chromosomes contains the coded instructions for creating all the
features of an organism. Similarly, in GAs, the chromosome contains a coded
representation for the various features or parameters for the problem the GA is
attempting to optimize. A phenotype is the expression of the genetic material.
For example, a gene on a human chromosome may contain the code for blue eyes
and the human with blue eyes is the expression of this genetic material or the
phenotype. In GAs, the phenotype corresponds to decoding the chromosome
structure into components or parameters that are meaningful in terms of the
problem the GA is attempting to solve. The GA treats the optimization problem
as a black box--it applies a set up inputs to the problem and measures the
overall performance (called the fitness of the solution) of the set of inputs.
The GA does not require knowledge of nor does it understand the relationships
that may exist between various parameters of the problem. The GA uses the
fitness value of the sample points and the similarities in their coded
representation (the chromosome) to search for better points in the search
space.
This concept comes from viewing the evolution of species as an optimization
process. Darwin's "survival of the fittest" principle (also called natural
selection) suggested that species become better adapted to their environment
because only the "fittest" individuals are able to survive and reproduce
passing their genetic code on to subsequent generations. The "fittest"
individuals are those with a some type of beneficial features or
characteristics that enables the creature to survive and reproduce in its
environment. As individuals with various beneficial characteristics mate
subsequent generations are more likely to contain individuals with combinations
of advantageous features that were possessed by their parents. And over time
the population as a whole becomes better suited for its environment by
preserving and propagating the beneficial characteristics.
GAs use the principle of natural selection to mate the better individuals in
the population with a higher frequency. The mating process, called crossover,
exchanges segments of the parents' coded chromosome to create new offspring.
This process allows the GA the combine features of good solutions in the hope
of developing better solutions to the problem.
The following describes the basic execution cycle of a Genetic Algorithm.
Initialization
Evaluation
repeat
Selection
Recombination
Evaluation
Replacement
until ( termination condition satisfied )
The Initialization phase randomly creates an initial population. At first the
GA has no information about the search space, so randomly sampling a number of
points from the search space is a good place to start the search process. In
the Evaluation phase, each chromosome is decoded and a fitness value is
assigned using the fitness function defined for the problem. In the Selection
phase, individuals in the population are selected for mating based on their
fitness with more fit individuals having a greater probability of being
selected. During the Recombination phase, the individuals selected in the
previous phase mate via a crossover procedure and exchange segments of their
coded chromosome to create new offspring. In addition, during this phase, the
new offspring or the parents may undergo mutation that slightly modifies their
genetic code. Then the new offspring and any parents that were modified by
mutation are re-evaluated as described above. And finally, the Replacement
phase determines which offspring and members of the current population will
survive into the next generation. This cycle repeats until a termination
condition has been satisfied, either a sufficiently fit population member or a
maximum number of reproductive trials. For a more detailed description of
Genetic Algorithms see Goldberg (1989) or Holland (1975).
{tc "The GA Toolkit" \L 2}The GA Toolkit
The description of GAs above presents a general framework for these algorithms.
There are actually various implementation for each phase, that have been
proposed by the leading researchers in the field, each with varying degrees of
effectiveness on different types of problems. With the variety of techniques
available, it is difficult to package together a single GA configuration that
is best suited for all problem domains. The GA Toolkit is a software tool
designed to provide a better alternative for researching and applying Genetic
Algorithms.
The GA Toolkit provides several techniques for each phase and allows the user
to select which component to use. The GA Toolkit will actually create an
executable GA program with the components that the user selects. This system
allows the user to create a variety of GA programs for solving different types
of problems. In addition, the Toolkit was designed so that new GA components
could easily be added to the system as better techniques are developed.
This tool offers many benefits to a variety of users. For those users that
want to apply GAs, the Toolkit provides a collection of the most successful
techniques available. These users can gain the benefits of using GAs without
detailed knowledge of the underlying theory and without developing their own
program code. For GA researchers, this tool greatly increases the ability to
explore the effectiveness of different techniques in a variety of problem
domains. In addition, the GA Toolkit allows the researcher to experiment with
a particular technique in a number of different contexts. For example, the
researcher could observer the performance of a selection technique using
different crossover and replacement methods and develop a better evaluation of
the selection procedure. This ability to isolate the effectiveness of a
particular method is essential for developing useful guidelines on which
components should be used in different problem domains.
{tc "Overview of Documentation" \L 2}Overview of Documentation
The remainder of this document describes how use the GA Toolkit. Chapter 2
describes the installation process. Chapter 3 discusses the user interface
program that allows the user to select the desired components and create an
executable GA program. This chapter also describes the GA components and the
set of problems that are currently implemented in the GA Toolkit. Chapter 4
discusses how to execute the GA programs you create with the Toolkit and
describes the output files produced by a GA program. And finally, Chapter 5
provides a reference for programmers who which to modify the system. This
chapter describes the classes used to implement the GA techniques and discusses
how to add new GA components to the Toolkit.
Chapter 2
{tc "2. INSTALLATION"}Installation
To install the GA Toolkit you need two files: install and gat.tar. Currently
these files are located in the /home/slienau/Toolkit directory on the Sun SPARC
workstations. If this directory does not exist when want to install the GA
Toolkit, check with Dr. Graham about the new location of these files. Once you
have found the necessary files you need to copy them to your home directory.
The install file is a Bourne shell script that will create a main directory for
the GA Toolkit, un-archive the files in the gat.tar file and create the
executable user interface program (gatuif). Use the following command to
execute the installation program.
% install [directory name]
The optional parameter allows you to specify the name of the directory where
you want the GA Toolkit installed. If this parameter is omitted the Toolkit
will be installed a directory called GATool. During the installation process
several messages will be displayed about the files that are being un-archived.
When the installation is complete a message will be displayed on your terminal.
This message will inform you that you need to add the information in the
cshrc.gat file (located in the Toolkit's main directory) to your .cshrc file in
your home directory. This file contains several environment variables that are
necessary for the Toolkit to operate correctly. The following shows the
contents of this file. Note: In your file, the GATOOL environment variable
may have a different directory name.
#
# GA Toolkit Env Variables
#
setenv GATOOL $home/GATool
setenv GATSRC $GATOOL/Source
setenv GATFNS $GATSRC/Funcs
setenv GATIN $GATOOL/Input
setenv GATHLP $GATOOL/Help
setenv GATUIF $GATOOL/Uif
setenv GATBIN $GATOOL/SH_BIN
set path = ($GATBIN $GATUIF $path)
The installation program creates several sub-directories under the directory
where the GA Toolkit was installed. The following briefly describes the
contents of each directory.
Source contains the source code for all the GA components.
Source/Funcs contains the source code for the test problems provided with the
Toolkit.
Input contains input files for the test problems in the Source/Funcs directory.
Help contains the help files for the user interface program.
Uif contains the source code for the user interface program.
SH_BIN contains several Bourne shell programs used by the user interface
program.
The library of components in the GA Toolkit were written in the C++ language.
In order to create executable GA programs you must be able to execute the C++
compiler. If you do not have the proper environment variable set up in your
.cshrc file for the C++ compiler, then you will also need to add this
information. The information you, currently, need to include in your .cshrc
file for the C++ compiler to work properly is shown below.
#
# C++ Env Variables (sun C++)
#
setenv MANPATH /usr/share/man:/usr/local/man:/usr/man
setenv CCROOTDIR /usr/local/lang/SC1.0
setenv CCLIBDIR $CCROOTDIR
setenv I $CCROOTDIR/include/CC
setenv LIBRARY $CCROOTDIR/libC.a
setenv cfrontC $CCROOTDIR/cfront
set path = (/usr/local/lang/SC1.0 $path)
After you have made the necessary modifications to your .cshrc file you should
logout and login again so that the changes will take effect. Now you have
completely installed the Toolkit and are ready to begin working with the user
interface program described in the next chapter.
Chapter 3
{tc "3. USER INTERFACE PROGRAM"}User Interface Program
The user interface program for the GA Toolkit allows the user to select a
component for each phase of the GA, select the problem to solve, create the
necessary input files, create an executable GA program and create a template
for defining a new problem. This section describes how to uses the various
features of the user interface program.
To start the user interface program use the following command:
% gatuif
The name "gatuif" is an acronym for GA Toolkit User InterFace. After a few
seconds the main menu will appear on your terminal. From the main menu, you
can access all the major features of the user interface program, such as
selecting GA components and creating an executable GA program. Pressing the
return key will activate the highlighted choice. The following describes each
of the options available on the main menu.
{tc "Selecting GA Components" \L 2}Selecting GA Components
The first step in creating an executable GA program is selecting which
components to use for each phase of the GA process. Activating the Select GA
Components option will bring up a screen divided into two parts. On the left
side of the screen there is a menu of components and the right portion of the
screen shows the current configuration--the component that is currently
selected for each phase. To change the component for a particular phase, use
the up and down arrows to select the category on the menu (the left side of the
screen). Pressing the return key will bring up a list of available components
or a sub-menu for the highlighted category. Once the desired technique is
highlighted, pressing the return key will select that technique. To see a
short description of a particular GA component highlight it by moving the up
and down arrow keys and press the F1 key (or the help key for your terminal).
When you are satisfied with the configuration of components, press CTRL-U to
return to the main menu.
This portion of the user interface program also manages any constraints on
which components may used together. When you select a category from the menu
(e.g., Selection or Replacement) only those techniques that are compatible with
the current configuration will be displayed. This will prevent you from
accidentally creating an invalid GA configuration.
{tc "Description of Available Components" \L 3}Description of Available
Components
Chromosome--Representation - The chromosome representation is how the
chromosome structure is implemented. Currently the Toolkit only provides one
implementation of the chromosome structure, but others, such as diploid or
two-dimensional chromosomes, could be added.
1-D Bit String - With this component, the chromosome is implemented as a one
dimensional string of ones and zeros.
Chromosome--Decoding - The chromosome decoding specifies how the chromosome
string will be interpreted or decoded for the evaluation function.
Binary - This component provides binary decoding of the chromosome string
(i.e., converting the string of 1s and 0s to a decimal integer).
Gray - This component allows the chromosome string to be interpreted as Gray
code. With Gray code, adjacent integers (e.g., 7 and 8 or 8 and 9) have a
Hamming distance of one.
Reproduction - The reproduction style defines how many offspring are produced
before the replacement phase is initiated.
Generational - In generational reproduction, an entire generation of offspring
are produced and then the replacement phase determines which offspring and
parents will survive in the next generation.
Individual - With individual reproduction, as soon an offspring is created the
replacement phase selects a member of the population to be removed in order to
make room of the new offspring. The concept behind individual reproduction is
that the GA could adapt faster if offspring are made available as soon as they
are produced rather than waiting for a complete generation.
Selection - The selection technique determines which members will be given an
opportunity to reproduce and pass their genetic code onto subsequent
generations. In general, the selection technique will provide more
reproductive trials to the more fit individuals of the population.
SUS - In Stochastic Universal Sampling (SUS) (Baker 1987), each individual is
allocated a slice on a spinning wheel according to its expected value. An
individual's expected value is the ratio of the its fitness to average fitness
of the population. The wheel contains N evenly spaced pointers, where N
is the size of the population. The wheel is spun one time and the individual
at each pointer is allocated a reproductive trial. With this selection
technique, each individual is guaranteed to receive either {EMBED Equation |}
or {EMBED Equation |} trials.
GENESIS Rank - GENESIS Rank selection (Grefenstette 1990) is similar to SUS
using a spinning wheel with N evenly spaced pointer. But the size of the
slice on the wheel is determined by the individual's rank within the population
rather than the actual fitness value.
Linear Rank - Linear Rank selection (Whitley 1989) uses a linear function to
bias the allocation of reproductive trials toward the better ranked
individuals. Linear Rank selection provides a parameter, called Selection
Bias, so that the degree of preference for the best members can be adjusted.
For example, a Selection Bias of 1.5 means that the best individual is 1.5
times more likely to receive a reproductive trial than the median ranked
individual in the population.
Replacement - The replacement phase determines which individuals will be
replaced when new offspring are produced. Since the replacement phase is
initiated at different times with the different styles of reproduction, there
are several techniques that are appropriate for each type of reproduction.
Replacement Techniques for Generational Reproduction
DeJong's Gap w/Single Elitism - This technique implements DeJong's Gap model
(1975) that allows overlapping population. The DeJong Gap specifies what
percentage of the next generation should be created by the genetic operators
(selection, crossover and mutation) and the remainder of the population is
filled by randomly selecting, without replacement, individuals from the parent
population. For example, a DeJong Gap of 0.8 would mean that 80% of the next
generation would be new offspring and the remaining 20% would be randomly
selected members from the parent population. This technique also provides a
single elitism feature which guarantees that the best solution developed so far
will always have a slot in the next generation.
Variable Elitism - This replacement policy allows the user to specify the
percentage of the population slots (by the Elite Gap parameter) to reserve for
the best solutions developed so far and the remaining slots are filled by new
offspring. With this technique, N offspring are created by the genetic
operators (selection, crossover and mutation) and then the best
N * Elite Gap individuals from among both the new offspring and parents
receive a slot in the next generation. Any remaining population slots are
filled by randomly selecting, without replacement, members from the set of new
offspring. For example, an Elite Gap of 0.35 reserves 35% of the next
generation for the best individuals and the remaining 65% is filled with new
offspring.
Replacement Techniques for Individual Reproduction
Remove Worst - With this replacement policy, when an offspring is created the
worst member of the population is replaced.
Reverse Rank - This replacement policy uses a linear function to bias the
selection for replacement towards the worst ranked members of the population.
The Replacement Bias parameter can be adjusted to altered the pressure on the
worst individuals. For example, a Replacement Bias of 1.2 means that the worst
individual in 1.2 times more likely to be replaced than the median ranked
individual in the population. In addition, the Elite Gap parameter can also be
used to explicitly preserve a percentage of the best ranked individuals. For
example, an Elite Gap of 0.1 with a population size of 50 means that the best
10% of the population (the 5 highest ranked members) cannot be replaced.
Reverse Tournament - With this replacement technique, two members from the
population are randomly selected and the least fit member is replaced by the
new offspring.
Generational Emulation - This replacement policy is attempting to emulate the
replacement policy used in standard generational GAs while using individual
reproduction. This policy monitors the trials allocated and used by each
individual. When a member has exhausted its allocated trials or did not
receive any trials because of low fitness it becomes a candidate for
replacement. When an offspring is created, a randomly selected individual from
the set of candidates is replaced. This replacement policy can only be used
with selection techniques that explicitly calculate the number of trials each
individual should receive (i.e., SUS and GENESIS Rank selection).
Crossover--Style - The crossover style specifies how the desired number of cut
points will be selected.
Traditional - With traditional crossover the cut points are randomly selected.
Reduced Surrogate - This crossover technique (Booker 1987) attempts to create
offspring that differ from their parents whenever possible. This techniques
notes the position on the chromosome string where the parents have different
values (called the "reduced surrogate") and randomly selects cuts points
between the first and last position in the reduced surrogate. For example,
Parent1 11001100
Parent2 10010110
Reduced -1-01-0-
Surrogate -0-10-1-
A cut point any where after the second position and the before the seventh
position would produce an offspring that differed from its parents.
Uniform - With Uniform crossover (Syswerda 1989), the parents' value for each
position on the chromosome string is swapped with a fixed probability to create
new offspring. For example, if the swapping probability is 0.5, then for each
position on the chromosome string we would flip a fair coin to decide whether
the offspring should receive the bit value from the first or second parent.
Crossover--# Cut Points - This category specifies the number of cut points that
will be selected or the how many segments to exchange between parents when
creating new offspring. You can select either one point or two point
crossover.
Mutation
Canonical - Once a bit has be selected for mutation, this technique randomly
chooses a new value for the bit from the set of possible values.
Statistics - This category defines the statistics to collect and report about
the GA experiment.
Basic Stats - This technique provides information about population convergence,
the best and average performance of the current population as well as on-line
(average of all solutions tested) and off-line (average of the best solution
for each generation) performance measures. See DeJong (1975) for a discussion
of on-line and off-line performance measures.
{tc "Selecting the Problem to Solve" \L 2}Selecting the Problem to Solve
The Select Problem option on the main menu allows you to select the problem the
GA will attempt to solve. Pressing the return key when Select Problem is
highlighted on the main menu will bring up a screen divided into two parts.
The top of the screen displays the name of the problem that is currently
selected. The bottom portion of the screen lists all the problems that are
currently defined in the system. To change the problem, use the up and down
arrow keys to highlight the desired problem, in the bottom portion of the
screen, and press the return key to select the problem. After you have
selected the desired problem, press CTRL-U to return to the main menu.
{tc "Description of Available Problems" \L 3}Description of Available Problems
Counting 1s - In this problem, the GA is attempting to maximized the number of
1s in the chromosome string. The fitness value is the number of 1s in the
chromosome string. (File func.ct1, minimization problem)
Partially Deceptive - This is an artificially created partially deceptive
problem. This problem consists of ten 4-bit sub-problems distributed at
positions i , i + 10, i + 20, i + 30, for i = 1 to 10 on the chromosome
string. For each 4-bit sub-problem, the order-1 and order-2 schemata are fully
deceptive, but the order-3 schemata are not deceptive. (File func.d4,
maximization problem) See Whitley (1991) for a discussion of deception in GAs.
DeJong's F1 through F5 are the five problems from DeJong's test suite. (See
DeJong 1975 for a description of these problems.)
Ackley's Fig. 1-7 - This problem (Ackley 1987, 16-17) is basically a flat space
with several very steep hills. The problem is defined by the following
function: {EMBED Equation |}. (File func.fig7, maximization problem)
Noisy Ackley's Fig. 1-7 - This problem is similar to Ackley's Fig. 1-7 except
that a noise factor is used to alter the value of the function in about half
the chromosomes evaluated. The noise factor is created by generating a random
number in the range [0, 1) and using it as a percentage of how much the actual
function value should be altered (i.e., noise factor = random number * function
value). The GA randomly chooses to add or subtract the noise factor to the
actual function value. (File func.nfig7, maximization problem)
Noisy Shifted 3-D Parabola - This problem also incorporates a noise factor in a
space defined by the following function: {EMBED Equation |}. This space is a
3-dimensional parabola shifted slightly off the origin. The noise factor is
defined by the following: noise factor = (random number / 2) * function value.
The GA randomly chooses to add or subtract the noise factor to the actual
function value. On average, half of the function evaluation will be altered by
the noise factor. (func.npar, minimization problem)
Shifted 3-D Parabola - The space of this problem is a 3-dimensional parabola
shifted slightly off the origin and defined by the following function: {EMBED
Equation |}. (File func.par, minimization problem)
Select/Replace Test - This problem was used to observe the characteristics of
various combinations of selection and replacement techniques. The function
used for this problem is the following: {EMBED Equation |}. (File func.repl,
maximization problem)
TSP-Swap - This is a 16 city Traveling Salesman Problem where we are attempting
to find the shortest tour between 16 cities in the continental US traveling by
automobile. In this problem, we use the information in the chromosome to tell
us how to create a permutation of the list of cities to visit. Each pair of
genes on the chromosome, tell us which two cities to swap to create a different
tour and perhaps located the optimum tour. (File func.swap, minimization
problem)
TSP-New-Swap - This problem is similar to the TSP-Swap problem, but we
interpret the information in the chromosome slightly different. Instead of
using pairs of genes for swapping instructions, we use the gene position and
the gene value. For example, if the first gene has a value of 5 then we swap
the cities at the first and fifth position on our list, if the second gene has
a value of 4 then we swap the cities at the second and fourth position on the
list, and so on. (File func.swp, minimization problem)
{tc "Creating an Executable GA Program" \L 2}Creating an Executable GA Program
After you have selected the component to use for each GA phase and the problem
the GA will be applied to, you can create the executable GA program. Use the
up and down arrow keys to select the Make GA option from the main menu and
press the return key. A dialog box will appear asking you to give a name for
the executable GA program. After providing a name the system will begin
re-compiling the necessary program code. This may take several minutes
depending on which components were changed, so please be patient. When the
executable program has been created a message will appear in the dialog box
informing you that the program was successfully created. Pressing the return
key after receiving the success message will return you to the main menu.
Compilation error should not occur if you are using the components original
provided with the GA Toolkit. However, if you have incorporated additional
components or modified the components in the Toolkit compilation errors could
occur if you have made an error in the program code. If errors were
encountered during the re-compiling process a message will appear that will
inform you of this event. This message will also tell you about a file that
contains all the message produced during the re-compilation process. Pressing
the return key while this message is displayed will return you to the main menu
where you should exit from the user interface program. By examining the file
containing the messages from the re-compilation process, you should be able to
locate where the error occurred and be able to correct it.
Before you can execute the program you have just created you need to create
several input files. This process is described in the next section. In
addition, you should refer to Chapter 4 for a discussion of how to execute your
GA program and a description of the output files created by the GA program.
{tc "Creating Input Files" \L 2}Creating Input Files
Before you can execute the GA program you will need to create two input files.
The in file contains various user defined parameters for controlling the GA
experiment such as the maximum number of trials and the crossover and mutation
probabilities. The genes file contains information on how to interpret the
chromosome string for the problem the GA is attempting to solve. For example,
the genes file specifies the number of genes in the chromosome and the format
for displaying the gene values. This section describes how to create these
input file through the user interface program. In addition, this section also
briefly describes the contents of each file.
You begin the input file creation process by selecting the Create Input Files
option on the main menu and pressing the return key. This will bring up a
dialog box asking you to provide a file suffix for the input files. When you
execute your GA program, the program will, by default, look for the "in" and
"genes" input files. However, providing the GA program with the file suffix
will cause the program to read the information from the input files with the
specified suffix. For example, if you use the suffix "p1" (for problem 1) then
the GA program will take the input information from the "in.p1" and "genes.p1"
input files. Using a file suffix is helpful if you want to experiment with
several problems with different input parameters and different chromosome
formats. If you want to use a file suffix, then type a suffix in the space
provided and press the return key. Otherwise, just press the return key to
accept the default value (no file suffix).
Then the Setup Input Files screen is displayed. This screen allows you to
choose which file (in or genes) to create. Use the up and down arrow keys to
highlight the desired option and press the return key to select the option.
The following describes how to create the in file and describes the various
parameters contained in this file. The Creating the "genes" File section will
describe the genes file and its contents.
{tc "Creating the \"in\" File" \L 3}Creating the "in" File
Selecting the in file option will bring up a screen divided into two parts.
The left side of the screen contains a menu with Edit and Save options. The
right side of the screen lists the various user-defined parameters and default
values for each. The default values are not the product of any scientific
study to develop optimum parameter values--they are only reasonable settings
for each parameter. Depending on the type of problem you want to solve the
default values may or may not be appropriate. Feel free to change the value of
any or all parameters.
To edit the values of the parameters, press the return key when the Edit option
is highlighted. A parameter on the right side of the screen will be
highlighted (usually the first parameter on the screen). Use the up and down
arrow keys to highlight a parameter. The SHIFT-U and SHIFT-D key combinations
can also be used to scroll up or down an entire screen. Pressing the return
key will bring up a dialog box asking you to enter a value for the highlighted
parameter. If you enter an invalid value for the parameter a dialog box will
appear explaining the appropriate values. After you have read this
information, pressing the return key will allow you to return to the previous
dialog box and enter an appropriate value. You can change as many parameters
as you want by highlighting the parameter and pressing the return key. If you
want more information about the parameters, pressing the F1 key will bring up a
help dialog box that contains short descriptions of all of the parameters.
When you have finished adjusting the parameter values, use the CTRL-U key
combination to return to the menu on the left side of the screen. Then you can
select the Save option to save your parameter settings. To exit from the in
file creating section press CTRL-U again and you will return to the Setup Input
Files screen that contains the menu where you selected which file to create (in
or genes).
{tc "Description of Input Parameters" \L 3}Description of Input Parameters
Total Experiments - This is the number of experiments you would like to conduct
with this set of input parameters. That is, the number of complete GA runs
where the GA converges to solution or exceeds the maximum trials. In each
experiment, the GA starts with the same initial population and then the random
number generator is seeded with a different value so that a different set of
choices will be made. Total Experiments should be an integer number greater
than zero.
Total Trials - This is the maximum number of trials you want to occur before
the GA is stopped. Total Trials should be an integer number greater than zero.
Population Size - This is the number of chromosomes or sample points of the
search space you want the system to work with. Population Size should an even
integer number greater than zero. (The number must be even because two parents
are selected and always produce two offspring.)
DeJong Gap - This is used with the DeJong Gap with Single Elitism replacement
policy. This parameter should be a real number in the following range [0.0,
1.0]. The Gap measure specifies what percentage of the population should be
created via the genetic operators (selection, crossover and mutation) and the
remaining portion of the population are randomly selected members of the
previous generation. For example, a Gap of 0.8 would mean 80% of new
generation is created by selection, crossover and mutation and the remaining
20% are copies of randomly selected members of the previous generation.
Elite Gap - This is used with the Variable Elitism and the Reverse Rank
replacement policies. This parameter should be a real number in the following
range [0.0, 1.0]. Elite Gap specifies the percentage of the population
reserved for the best members so far. For example, suppose we have a
population of 50 chromosomes and an Elite Gap of 0.1. This means we want to
preserve the best 10% or 5 members.
Prob. Cross - This is the crossover probability or how often crossover should
occur. This should be a real number in the following range [0.0, 1.0].
Prob. Mutation - This is the mutation probability or how often mutation should
occur. This should be a real number in the following range [0.0, 1.0].
Selection Bias - This is used with the Linear Rank selection technique. This
is how much preference should be given to the better members of the population.
Selection Bias should be a real number in the following range (1.0, 2.0]. For
example, a selection bias of 1.5 means the best member of the population is 1.5
times more likely to be selected than the median ranked individual of the
population.
Replacement Bias - This is used with the Reverse Rank replacement policy. This
is how much pressure should be placed on replacing the worst individuals in the
population. Replacement Bias should be a real number in the following range
(1.0, 2.0]. For example, a Replacement Bias of 1.5 means the worst individual
is 1.5 times more likely to be replaced than the median ranked individual.
Maximum Spin - This is used to stop the experiment if no progress is being
made. If a new chromosome has not be created in (Maximum Spin) generations
then the population has probably converged and no real progress is being made
so we should stop this experiment. This parameter should be an integer number
greater than zero.
Statistics Interval - This is how many trials should occur between printing
experiment statistics. This parameter should be an integer greater than zero.
Statistics are only calculated after (Population Size) attempted trials or
roughly each generation. The number of actual trials (when a chromosome needs
to be evaluated) may be less than the attempted trials. So the Statistics
Interval is a lower bound on the number of trials that must occur before
another set of statistics is printed. That is, the actual number of trials
before the reporting of statistics may be slightly more than the specified
Statistics Interval.
Hamming Metrics - This is a boolean flag for whether the clustering and drift
Hamming metrics (Frederick, Sedlmeyer and White 1991) for population
convergence should be calculated and reported. Clustering measure the average
Hamming distance between all pairs of chromosomes in the population. And drift
measures the average Hamming distance between the best individual and all other
chromosomes in the population.
Best Size - This is how many of the best chromosomes developed should be saved
and printed to a file (the best file described in Chapter 4) at the end of the
experiment. This parameter should be an integer greater or equal to zero. If
Best Size is greater zero, then the (Best Size) members with the best fitness
values developed in the entire experiment are preserved in a separate area for
reporting purposes only, regardless if any elitism policy is used in the other
phases of the GA.
Worst Size - This is used as a scaling mechanism with SUS selection. This is
similar to the scaling window used in the GENESIS GA program. This parameter
should be an integer number greater or equal to zero. If we are minimizing a
numerical function, then the fitness of a member (used for the selection
process) is the worst fitness value minus the function evaluation for this
particular member. The worst fitness value is the worst member seen in the
last (Worst Size * Population Size) trials, if Worst Size is greater than 0.
This allows more differentiation between members whose fitness values are very
close together. The smaller the Worst Size (greater than zero) the greater the
differentiation. For a Worst Size of zero, the worst value is the worst member
seen in the entire experiment thus far. This represents an infinite scaling
window or very little ability to differentiation between very similar values.
This scaling effect works analogously for maximization problems as well.
Maximize - This is a boolean flag to indicate whether the evaluation function
should be maximized or minimized. Turn the flag on (or yes) for maximization
and off (or no) for minimization. Note: If you are setting parameters for one
the problems provided with the GA Toolkit make sure you set this flag
correctly, otherwise you will get unexpected results. The Description of
Available Problems section above states whether each problem should be
maximized or minimized.
Evaluate All - This is a boolean flag to tell the system whether to evaluate
all chromosomes or only those that need evaluation. Generally, if a new
chromosome (offspring) is equal to one of its parents we do not need to
re-evaluate the function again, but simply copy the function parameters and
fitness measure from the matching parent. However, in noisy or non-stationary
functions the same chromosome may have a different function evaluation at
varying times, so all chromosomes inserted into the next generation need to be
re-evaluated. Setting this flag on (or yes) forces all chromosomes to be
evaluated at the end of each generation. If this parameter is turned off (or
no), new chromosomes are only evaluated if they differ from their parents.
Seed Population - This is a boolean flag for whether the initial population
should be seeded with specific chromosomes from an input file. Setting the
flag on (or yes) will cause the program to read the init file (or the
init.suffix file if you provide the GA program a file suffix). This file
should contain the chromosome strings you want to place in the initial
population with one chromosome per line. This file can contain any where from
one to an entire population of chromosomes.
Debugging Level - This is used for tracing the procedure calls activated during
the GA run. The debugging feature can produce and a lot of information that is
not very interesting, but may be helpful in debugging a new technique. I would
recommend always setting this flag to 0 (no debugging trace). However, if you
are interested in this information a value of 1 shows entry and exit of all
major procedures. A value of 2 produces the same information as a value of 1
but also includes entry into class constructors and destructors. And finally a
value of 3 shows the information for a value of 2 plus some addition
information about major events that are occurring.
Display - This is a boolean flag to indicate whether to show, on the screen,
information about the GA's execution. This display includes information about
the current trial, experiment statistics, the best chromosome so far and which
GA components are being utilized. Setting the flag to on (or yes) turns the
display mode on.
Print Generations - This is a boolean flag to determine whether the population
should be printed to a file (the pops file described in Chapter 4) at the end
of each generation. Turning this flag on (or yes) will cause this information
to be printed.
Random Seed - This is the seed for the random number generation procedures.
This parameter should be a long integer greater or equal to zero.
Swapping Prob. - This is the swapping probability for Uniform crossover. This
determines how frequently the positions on the chromosome string will be
swapped. This should be a real number in the following range [0.0, 1.0].
{tc "Creating the \"genes\" File" \L 3}Creating the "genes" File
The genes file describes the structure of the chromosome string. This file
specifies how many genes the chromosome contains, the length of each gene and
the format for displaying the value of each gene. If you are applying a GA to
one of the pre-defined problems that was installed with the GA Toolkit, then
you do not need to create the genes file. The genes files for these problems
have already been created and are located in the Input sub-directory. The
suffix of the genes file corresponds the suffix of the file where the problem
is defined (e.g., genes.ct1 is the genes file for the Counting Ones problem
defined in the func.ct1 file).
If you need to define a new genes file the following describes the process for
accomplishing this task. Selecting the genes file from the Setup Input Files
screen and pressing the return key brings up a dialog box asking whether you
will be using the floating point representation. If the problem the GA is
being applied to is defined using the floating point decoding mechanism (i.e.,
the eval_chrom class that defines the problem is derived from the float_chrom
class), then you should answer yes to this question. The answer to the
floating point representation question determines what type of information is
required in the genes file.
After providing an answer to the floating point representation question, a
screen divided into two sections will appear. The left side is again a menu
that allows you to add, edit or delete gene specifications and save the
information to the genes file. The right portion of the screen is a table
format for displaying the gene specifications that you will create. For
floating point representation, the table contains fields for the gene number,
the minimum and maximum value of the floating point range, the number of values
in the range, the format for printing the gene value and the length of the gene
on the chromosome string. When you are not using the floating point
representation the table contains three fields: the gene number, the format
for printing the gene value and the length of the gene.
Adding Genes
The procedure for specifying the information in the genes file is similar
whether or not the floating point representation is being used. The only
difference is the fields that need to be specified. To add information about a
gene highlight the Add Gene option in the menu on the left side of the screen
and press the return key. Then the system will ask you to provide a value for
each to the appropriate fields. The following describes each field you will be
required to specify.
Fields for Floating Point Representation
Minimum Value - this is the smallest value in the desired floating point range.
Maximum Value - this is the largest value in the desired floating point range.
Number of Values - this is the number of values in the floating point range and
defines the precision of the values. The value for this field must be a power
of 2 (e.g., 2, 4, 8, 16, 32, 64, ...). For example, suppose the minimum value
is -5.11, maximum value is 5.12 and the number of values is 1024, then the gene
can represent values in the following range [-5.11, -5.10,
-5.09, ... , 5.10, 5.11, 5.12] and the precision is 0.01.
Format - this is the printf conversion string for printing the value of the
gene. The gene values are store in an array of type double. The following are
valid conversion strings:
%f - for floating point conversion (e.g., 5.043245 using %.3f is printed as
5.043).
%e and %E - for scientific notation (e.g., 504.3245 using %.4e is printed as
5.0432e+02 and
using %.4E is printed as 5.0432E+02).
%g and %G - %e or %E is used if the exponent is less than -4 or greater than
or equal to the precision; otherwise %f is used (e.g., 504.3245 using %.5g is
printed as 504.32 and
50432.45 using %.3g is printed as 5.04e+04).
For more explanation of printf conversion strings refer to a C or C++
programming manual. Note that since the values of the genes are stored as type
double, the only valid conversion characters are f, e, E, g and G.
Length - this field will automatically be calculated for you by using Number of
Values field. For example, it the number of values field is 64 then the length
is 6 positions on the chromosome string (log(64) base 2 = 6).
Repetitions - After you have provided values for all the required fields the
system will ask you how many repetitions of the specification you wish to
create. If you have multiple genes with the same specification you will not
need to repeatedly enter the same information over and over again, just specify
the number of copies to create.
Fields for Non-Floating Point Representation
Format - this is again the printf conversion string for printing the gene
value, similar to the format field with floating point representation (see
above).
Length - this is the number of positions on the chromosome string that the gene
will occupy.
Repetitions - similar to the floating point representation you can instruct the
system to create multiple copies the gene specification you have just created.
After you have finished the gene specification, the values for each field will
be displayed in the table on the right side of the screen. If you create more
than a screen full of genes then you can use SHIFT-U and SHIFT-D to scroll up
or down to view the genes you have created. At this point you can edit or
delete a gene you have created or add additional genes.
Editing a Gene
To edit a gene specification, highlight the Edit Gene option and press the
return key. A dialog box will appear asking you for the number of the gene you
wish to modify. The gene number is the first column in the table on the right
portion of the screen. After you have provided the gene number a pop-up menu
appears that contains the fields which you may edit. Using the up and down
arrow keys to highlight the field to modify and press the return key. Another
dialog box will appear asking you to enter a new value for the field. From the
pop-up menu of fields you can modify any or all fields of that particular gene.
When you are finished modifying the gene press CTRL-U to return to menu on the
left side of the screen.
Deleting a Gene
If you made a mistake and want to completely remove a gene, highlight the
Delete Gene option and press the return key. A dialog box will appear asking
you for the number of the gene you wish to delete. Once you remove a gene the
only way to undo this action is re-specific the gene through the Add Gene
option on the menu.
Saving the genes File and Exiting
And finally, when you are finished creating the information about the genes,
use the Save option to write this information to the genes file. To exit the
genes specification section press the CTRL-U key combination. This will return
you to the Setup Input Files screen, where you selected which file to create
either the in or genes file. If you have already created both input files you
can use the CTRL-U key combination to return to the main menu of the user
interface program.
{tc "Defining a New Problem" \L 2}Defining a New Problem
The Create New Problem option on the main menu allows you to create a template
file for defining the evaluation function for a new problem. Pressing the
return key when the Create New Problem option is highlighted initiates this
option. A series of dialog boxes will appear asking you questions about the
new problem.
You will first be asked to enter a suffix for the file that will contain the
new problem. All evaluation functions are keep in files with "func" as the
root name of the file. The file suffix allows the system to identify the
proper file when you select a particular problem to solve. (Note: You should
use a unique suffix otherwise the file for a previously defined problem will be
replaced with the new file you are creating. See the Description of Available
Problems above for the suffixes that are currently being used.)
The second dialog box that will appear asks you to provide a short descriptive
name for the problem. The name you provide is the description that will appear
in the list of problems when you select the problem to solve through the Select
Problem option on the main menu.
The final question that you will be required to answer is the type of values
that will be required by the new evaluation function. The system needs to know
the type of values that are needed so that it can set up the appropriate
representation and decoding mechanism. In this dialog box you have a menu of
four choices. You can use the up and down arrows to highlight the desired type
of values and press the return key to select that option. The following
describes the four types of values.
Properties of Chromosome String - this option is for evaluation functions that
do not require the genes on the chromosome to be decoded as integers or
floating point values. The fitness of the chromosome will be determined by
some type of property of the binary chromosome representation. For example,
the Counting Ones Problem would use this type of value since the fitness is
related to the number of 1s in the chromosome string.
Integers - Binary Decoding - this option is for evaluation functions that
require integer parameters where you want the genes to be interpreted as binary
code. That is, the genes on the chromosome string with be converted from the
binary representation to decimal integers.
Integers - Gray Decoding - this is similar to the previous option but the genes
on the chromosome string will be interpreted as Gray code rather than binary
code.
Floating Point Values - this option is for evaluation functions that require
floating point values. In the case of floating point values you do not need to
specify the decoding mechanism (i.e., binary or Gray) that will be used when
you define the problem. The decoding mechanism can be selected through the
Select GA Components option on the main menu when you create the GA program
that will be applied to this problem.
After you have chosen the type of values that are required for the new problem,
a message will appear telling you that the file for this problem has been
successfully created and that you need to edit the file to add the program code
for the evaluation function. Pressing the return key while this message is
displayed will return to the main menu. If you want the edit the new file you
will need to exit the user interface program.
The files for all the evaluation functions are keep in the Source/Funcs
sub-directory of the main directory where the GA Toolkit was installed. To
edit the new file you will need to go to this directory. Your new file will
have the "func" root name and the suffix you provided (e.g., func.test would be
the file created if you specified "test" as the suffix). You can use any text
editor to add the program code for your new evaluation function. The file you
created will look similar to the following. The contents of the actual file
are printed in a Courier font and additional comments about the file are
printed in a Times Roman font.
/* GA Toolkit - Sara A. Lienau
*
* Center for Advanced Technology
* Computer Science and Telecommunications Program
* University of Missouri - Kansas City
* ------------------------------------------------------------------------
* file: eval_chr.temp
*
* purpose: eval_chrom class template
*
* modified: 21 aug 91 - created
*/
#include "extern.h"
#include "chrom.h"
#include "eval_chr.h"
// It may be helpful to document some information about the function
// here. This is entirely optional.
/* Function Description
The Name/Source item contains the short descriptive name that you specified for
this problem. This is the name that will be displayed on the list of available
problems from the Select Problem section of the user interface program.
* Name/Source: Test Problem 1
* Description:
* Properties:
* Min or Max:
* Min:
* Max:
* Limits:
* Space:
The END_CHROM item is used to specify which class the eval_chrom class will be
derived from. This determines what type of values will be provided to the
evaluation function. If you need to change the type of values that you
originally specified you should use one of the following class names:
bit_chrom, gray_chrom or float_chrom. The bit_chrom class should be used for
values from the properties of the chromosome string or integer values using
binary decoding. The gray_chrom class should be used for integer values using
Gray decoding. And the float_chrom class should be used for interpreting the
genes as floating point values.
* END_CHROM: gray_chrom
* GENES info in:
*/
/*------------------------------------------------------------------------
-- class eval_chrom (objective function)
------------------------------------------------------------------------*/
The get_name function is used as a kind of sanity check. This function returns
the name of the problem being used and is printed in an output file (the log
file described in Chapter 4). This is useful to confirm that you are solving
the problem you intended.
char* eval_chrom::get_name(void){
// get name - returns a string describing the evaluation function
// and chromosome representation
char name[60];
strcpy(name, "Test Problem 1, ");
strcat(name, END_CHROM::get_name());
return( name );
}
The eval function is where you will insert the program code for calculating the
fitness of each chromosome.
void eval_chrom::eval(void){
// apply evaluation function to chrom
The decode function is used to decode the genes of the chromosome string into
the appropriate type of values for your evaluation or fitness function. As
noted in the comments in the file, you only need to call the decode function if
the evaluation function requires floating point or integer parameters. If you
are calculating the fitness value on some property of the chromosome string,
you can comment out or remove the call to the decode function.
// decode bit string first (only necessary if func. requires floating
// point or integers)
END_CHROM::decode();
/*---- BEGIN MODIFY ----*/
/* Insert code for evaluation function here. Final value MUST be
* assigned to the variable "fitness".
*
* Summary of Variables
* str - Chromosome Bit String
* array of unsigned short int
* LCHROM - size of "str" array
* vect - Decoded Genes of Chromosome if decode function called
* array of double
* GENES - size of "vect" array
*
* CAUTION: str, LCHROM & GENES should be treated as read-only
* variables. Assigning values to these variables may cause
* unpredictable behavior. However, you may assign values to the
* elements of the "vect" array if you want something other than
* the decoded genes to be printed by the chromosome print function.
*/
The above comments give a summary of the variable you have access to for
calculating the fitness of the chromosome. As an example suppose we want to
optimize the function {EMBED Equation |} and we have properly defined three
genes (one for each parameter) in the genes file. The following is an example
of how to code the fitness function. The decode function above will be called
to place the decoded values of the genes in the "vect" array.
double x, y, z;
// vect[0] is the x parameter, vect[1] is the y parameter and vect[2] is the z
parameter
x = vect[0] * vect[0]; // x^2
y = vect[1] * 3; // 3*y
z = vect[2] - 2;
z = z * z; // (z-2)^2
fitness = x + y + z;
/*---- END MODIFY ----*/
// clear eval flag
do_eval = 0;
}
After you have written the program code for the new problem you can apply a GA
program to this problem by selecting it through the Select Problem option on
the main menu.
Chapter 4
{tc "4. EXECUTING YOUR GA PROGRAM"}Executing YOUR GA Program
This chapter describes how to execute the GA programs you create through the
user interface program and the output files produced by the GA program.
The GA program works with several input files and creates several output files.
You can direct the GA program to used the default names for these files (i.e.,
file with no suffix) or used the files with a specified suffix. To execute the
GA program use the name you assigned to the program as the command and you may
supply an optional parameter for the file suffix (i.e., % ga_program_name
[suffix]). For example, if you selected the default name for your GA program
(i.e., GA.EXE) and you want the program to read input files with the "test"
suffix (i.e., in.test and genes.test) and create output files with this suffix
use the following command.
% GA.EXE test
If you do not supply the optional command line parameter the GA program will
look for the default input files (i.e., in and genes) and the program will use
the default names for the output files created during the run (i.e., best, log,
out, and pops). The file suffix options allows you to save the input and
output information about different experiments without continually renaming the
files used by the system.
{tc "Description of Output Files" \L 2}Description of Output Files
best The best file contains the specified number of the best chromosomes
developed during each experiment. The number of chromosome printed in this
file is controlled by the Best Size parameter in the in file. This file is
only created if the Best Size parameter is greater than zero. This file
contains the chromosome string, the value of each gene, the fitness value and
the generation and trial when the chromosome was first created. The following
is an example of this file.
Experiment: 1
101101010101001110100101110011
2.13 -1.98 -1.41
0.5954 10 193
The first line is the chromosome string. The second line show the values of
the genes defined for this chromosome. The last line contains the fitness
value (0.5954) and the generation (10) and trial (193) when this chromosome was
created, respectively.
log The log file contains general information about the GA experiment. It
contains the date and time the experiment was conducted, most of the values of
the input parameters and the GA components used for each phase of the GA
process. The following is an example of a log file.
Experiment Conducted: Sun May 3 13:29:10 1992
Input Parameters:
Total Experiments = 1
Total Trials = 200
Population Size = 20
Chromosome Length = 30
DeJong Gap = 1.0000
Elite Gap = 1.0000
Prob. Cross = 1.0000
Prob. Mutation = 0.0010
Selection Bias = 1.5000
Replacement Bias = 1.5000
Maximum Spin = 3
Best Size = 1
Worst Size = 3
Maximize = 0
Random Seed = 123456789
Genetic Algorithm Components
Chromosome: Shifted 3-D Parabola, Floating Pt, 1-D Bit Str.
Reproduction: Generational
Selection: GENESIS Rank
Replacement: DeJong Gap w/Single Elitism
Crossover: 1 PT Reduced Surrogate
Mutation: Canonical Mutation
Statistics: Basic Stats
out The out file contains the experiment statistics that are printed after each
Statistics Interval (input parameter in the in file). At each Statistics
Interval the following information in printed.
Current Generation - this is the number of generational cycles completed.
Current Trial - this is the number of trials or chromosomes created so far.
Lost Allele - this is the number of allele (positions on chromosome) where the
entire population has the same value.
Converged Allele - this is the number of allele where at least 80% of the
population has the same value.
Allele Bias - this is the average percentage of the most prominent value for
each allele. For example, a bias of 0.70 means that, on average, each position
has converged to 70% 0s or 70% 1s. The minimum bias is 0.5 when there is an
equal distribution of 1s and 0s for each allele.
On-line performance - this is the mean of all solutions tested to this point in
the run. (DeJong 1975)
Off-line performance - this is the mean of the best solutions developed at the
end of each generation. (DeJong 1975)
Best - this is the fitness value of the best solution in the current
population.
Average - this is the average fitness of the population at this point in the
experiment.
If the Hamming Metrics flag is turned on then the On-line and Off-line
performance fields are replaced by the Clustering and Drift metrics,
respectively.
Clustering - this is the average Hamming distance between each pair of
chromosomes in the population. A value near 1.0 indicates a lot of diversity
in the population. As the clustering metric moves toward 0.0 the population is
converging and losing diversity. (Frederick, Sedlmeyer and White 1991)
Drift - this is the average Hamming distance between the best chromosome and
all other chromosomes in the population. Similar to the clustering metric, a
value near 1.0 indicates a lot of diversity and a value of 0.0 means the
population has completely converged to a single chromosome. (Frederick,
Sedlmeyer and White 1991)
The following is an example of the out file.
Experiment: 1
Gen Trials Lost Conv Bias OnLine Offline Best Average
0 20 0 0 0.582 3.2093e+01 1.1283e+01 2.874900e+00 3.209285e+01
2 56 0 2 0.635 2.8387e+01 5.8203e+00 2.750900e+00 2.168309e+01
4 94 0 6 0.670 2.3457e+01 4.4480e+00 9.473000e-01 1.582167e+01
6 134 0 8 0.687 1.9270e+01 3.3912e+00 9.033000e-01 7.569095e+00
8 170 3 13 0.732 1.6394e+01 2.8505e+00 8.090000e-01 4.364355e+00
10 208 3 18 0.772 1.4202e+01 2.4559e+00 5.954000e-01 4.710895e+00
pops The pops file contains a printed copy of all chromosomes in each
generation. This file is only created if the Print Generation flag is turn on
in the in file. Depending on the population size and the maximum number of
trials this file can be very large. The following shows the format of this
file.
Experiment: 1
GENERATION: 0
111110010010000101011000101011
4.84 0.21 0.43
16.8946 0 1
001101111100110001010011010010
-2.89 -3.15 -3.02
27.6750 0 2
010101001000111011111110010101
-1.74 -2.73 4.05
52.1030 0 3
Each chromosome printed in the pops file has the same format as those printed
in the best file. The first line contains the chromosome string. The second
line displays the value of each gene defined for the chromosome. And the last
line contains the fitness value and the generation and trials when the
chromosome was created, respectively.
CHAPTER 5
{tc "5. PROGRAMMER'S REFERENCE"}Programmer's Reference
{tc "Description of Classes" \L 2}Description of Classes
Chromosome Classes
base_chrom - This class is the root of the chromosome hierarchy of classes.
The base_chrom class provides several data items that are common to all
chromosomes regardless of their actual implementation, such as a variable for
the fitness value. (Files chrom.h & chrom.C)
bit_chrom - The bit_chrom class implements a one-dimensional binary string
chromosome representation. This class is derived from base_chrom and provides
services for the other GA phases to access and modify the chromosome structure.
(Files chrom.h & chrom.C)
gray_chrom - The gray_chrom class is derived from bit_chrom and overloads the
decoding method so that the chromosome can be interpreted as Gray code (rather
than binary). (Files chrom.h & chrom.C)
float_chrom - The float_chrom class provides a decoding mechanism that allows
the chromosome to be decoded as floating point values rather than integers.
The decoding procedure first decodes the chromosome to integers and then maps
the integer value to a floating point range. This class can be derived from
either bit_chrom or gray_chrom so that the chromosome can be interpreted as
binary or Gray code. (Files chrom.h & chrom.C)
eval_chrom - The eval_chrom class implements the problem specific evaluation or
fitness function. This class can be derived from bit_chrom, gray_chrom or
float_chrom depending on what type of values the evaluation function requires
(i.e., integers or real numbers). (File eval_chr.h & eval_chr.C)
Reproduction Classes
base_pop - The base_pop class is the root of the population hierarchy of
classes. It implements data structures and procedures that are common to both
generational and individual reproduction. For example, both style of
reproduction need at least on population data structure and they share common
initialization and termination procedures for the basic execution cycle of the
GA process. (Files populat.h & populat.C)
pop_gen - The pop_gen class is derived from base_pop and implements the basic
execution cycle for generational reproduction. (Files pop_gen.h & pop_gen.C)
pop_ind - The pop_ind class is derived from base_pop and implements the basic
execution cycle for individual reproduction, where one or two offspring are
created and immediately inserted into the population. (Files pop_ind.h &
pop_ind.C)
Selection Classes
rank_sel - The rank_sel class implements Linear Rank selection and can be used
with either generational or individual reproduction. (Files rank_sel.h &
rank_sel.C)
rank_sus_gen - The rank_sus_gen class implements GENESIS Rank selection for
generational reproduction. (Files rk_sus_gen.h & rk_sus_gen.C)
rank_sus_ind - This class implements GENESIS Rank selection for individual
reproduction. (Files rk_sus_ind.h & rk_sus_ind.C)
sus_gen - The sus_gen class implements SUS selection for generational
reproduction. (Files sus_gen.h & sus_gen.C)
sus_ind - This class implements SUS selection for individual reproduction.
(Files sus_ind.h & sus_gen.C)
Replacement Classes
base_replace_g - The base_replace_g class is the root of the hierarchy for
generational replacement policies. This class implements a shared procedure
for storing new offspring until an entire generation has been created. (Files
repl_gen.h & repl_gen.C)
gen_gap_e1 - The gen_gap_e1 class is derived from the base_replace_g class and
implements DeJong's Gap with Single Elitism replacement policy for generational
reproduction. (Files repl_gen.h & repl_gen.C)
gen_e_n - The gen_e_n class is derived from the base_replace_g class and
implements the Variable Elitism replacement policy for generational
reproduction. (Files repl_gen.h & repl_gen.C)
killer - The killer class implements the Remove Worst (kill_worst procedure),
Reverse Rank (rev_rank procedure) and Reverse Tournament (rev_tourn procedure)
replacement policies for individual reproduction. (Files killer.h & killer.C)
repl_rank - The repl_rank class implements tasks that Linear Rank selection,
when used with individual reproduction, requires the replacement phase to
accomplish (i.e., inserting new offspring in the proper order according to
their rank). This class is derived from the killer class so that the
replacement policy can be varied independent of the additional tasks that the
selection technique requires. (Files repl_rank.h & repl_rank.C)
repl_rk_sus - The repl_rk_sus class implements tasks that GENESIS Rank
selection, for individual reproduction (rank_sus_ind), requires the replacement
policy to accomplish. This class is derived from the killer class. The
repl_rk_sus class also implements the Generational Emulation replacement policy
(the kill_term procedure). (Files repl_rk_sus.h & repl_rk_sus.C)
repl_sus - The repl_sus class implements tasks that SUS selection, for
individual reproduction (sus_ind), requires the replacement phase to
accomplish. This class is derived from the killer class. The repl_sus class
also implements the Generational Emulation replacement policy (the kill_term
procedure). (Files repl_sus.h & repl_sus.C)
Crossover Classes
n_pt - The n_pt class implements Traditional crossover where the cut points are
randomly selected. (Files n_pt.h & n_pt.C)
reduced_sur - The reduced_sur class implements the Reduced Surrogate crossover
technique. (Files reduced.h & reduced.C)
one_pt - The one_pt class can be derived from either n_pt or reduced_sur to
provide a one point crossover procedure for either Traditional or Reduced
Surrogate crossover, respectively. (Files cross.h & cross.C)
two_pt - The two_pt class similar to the one_pt class can be derive from either
n_pt or reduced_sur for a crossover technique where two cut points are
selected. (Files cross.h & cross.C)
uniform_xo - The uniform_xo class implements Uniform crossover. This class is
derived from n_pt which provides the procedure for determining when crossover
should be performed according to the crossover probability. But the uniform_xo
class overloads the procedure for selecting the actual cut points. (Files
unixo.h & unixo.C)
Mutation Class
can_mute - The can_mute class implements the Canonical mutation technique.
(Files can_mute.h & can_mute.C)
Statistics Classes
basic_stat - The basic_stat class is used to collect and report various
statistics about the GA experiment. (Files stat.h & stat.C)
extreme_perfs - The extreme_perfs class is used by the basic_stat class for
monitoring the best and worst performances or solutions developed by the GA.
The best individuals may be printed to a file at the end of each experiment.
The worst solutions developed are used as a scaling mechanism with
fitness-based selection techniques (e.g., SUS). (Files stat.h & stat.C)
{tc "Description of #define Directives" \L 2}Description of #define Directives
With the GA Toolkit the user is easily able to exchange the components used for
each phase of the GA process. To exchange components without manually
modifying the program code, #define preprocessor directives of the C++ language
were used where a particular class name needed to specified in the source code.
Different techniques are easily exchange by changing the class name that is
associated with the appropriate #define identifier. For example, the SELECT
identifier specifies which selection technique the user desires. So when we
needed to create an instance of the a selection class in the source code, we
used the SELECT identifier name rather than the actual class name. Then we can
easily change which selection technique is used by associating a different
class name with the SELECT identifier. For example,
#define SELECT sus_ind // SUS selection for individual reproduction
// create instance of a selection class
SELECT select_obj;
// rather than
sus_ind select_obj;
The following describes the purpose of each #define identifier and the class
names that may be associated with each identifier.
#define
Identifier
Valid Class
Names
Description
REP_CHROM
bit_chrom
The REP_CHROM identifier specifies the representation chromosome class.
(Located in file def_rep.h)
CODE_CHROM
bit_chrom
gray_chrom
The CODE_CHROM identifier specifies the class that implements the desired
method of chromosome decoding . (Located in file def_rep.h)
END_CHROM
bit_chrom
gray_chrom
float_chrom
The END_CHROM identifier specifies the class that the eval_chrom class will be
derived from. (Located in file def_rep2.h)
CHROM
eval_chrom
The CHROM identifier specifies the complete chromosome class that all the other
phases manipulate (e.g., selection and crossover). (Located in file def_chr.h)
POP_TYPE
pop_gen
pop_ind
The POP_TYPE identifier specifies the style of reproduction (generational or
individual) that will be used. (Located in file def_pop.h)
SELECT
rank_sel
rank_sus_gen
rank_sus_ind
sus_gen
sus_ind
The SELECT identifier specifies the selection technique to use. (Located in
file def_phase.h)
REPLACE
gen_gap_e1
gen_e_n
repl_rank
repl_rk_sus
repl_sus
The REPLACE identifier specifies the replacement class to use. (Located in
file def_phase.h)
KILLER
kill_worst
rev_rank
rev_tourn
kill_term
The KILLER identifier specifies the replacement policy when using individual
reproduction. (Located in file def_kill.h)
CROSS_TYPE
n_pt
reduced_sur
The CROSS_TYPE identifier specifies which class to use for selecting the
desired number of cut points. This is the class where one_pt or two_pt will be
derived from. (Located in file def_xo.h)
CROSS
one_pt
two_pt
uniform_xo
The CROSS identifier specifies the crossover technique to use. (Located in
file def_phase.h)
MUTE
can_mute
The MUTE identifier specifies the mutation technique to use. (Located in file
def_phase.h)
STATS
basic_stat
The STATS identifier specifies the class to use for collecting and reporting
experiment statistics. (Located in file def_phase.h)
{tc "Adding a New GA Component" \L 2}Adding a New GA Component
This section describes how to add a new GA component to the Toolkit's library.
This process requires that you create the program code for your new component
and modify several of the existing source code files. The following describes
the necessary steps and illustrates this process by showing how the uniform
crossover technique was added to the Toolkit.
1. Create Program Code for New Component
The first step is to write the program code for your new component. The major
rule you must follow is that the class definition of your new component must
have the same public interface as the classes already defined in the Toolkit
for that phase. This is so that you can easily exchange the GA components.
The header files and comments in the source code describe the interfaces for
each phase. The source code files for your component should be placed in the
Source sub-directory where the Toolkit was installed. The more familiar you
are with the source code and the #define identifiers the easier it will be to
develop the code for your new technique.
As an example, the following describes the modifications that were necessary to
incorporate the uniform crossover technique. The first step was to look at the
public interface for the crossover phase. In the crossover hierarchy there
were originally four classes: one_pt, two_pt, n_pt, and reduced_sur. The
one_pt and two_pt classes implement crossover with the corresponding number of
cut points. These classes can be derived from either n_pt or reduced_sur so
that the desired number of cut point(s) can be randomly selected (n_pt) or
selected by the Reduced Surrogate method (class reduced_sur, which attempts to
create offspring that differ from the parents whenever possible). The other
phases of the GA will interact with an instance of the one_pt or two_pt class.
So we should first look at the public interface for these classes. The class
definition for one_pt is shown below (this is in the cross.h file).
class one_pt : public CROSS_TYPE {
public:
one_pt(void) : CROSS_TYPE(1){}
char* get_name(void);
};
The use of the CROSS_TYPE identifier allows us change which class (n_pt or
reduced_sur) one_pt will be derived from. The public interface contains a
constructor and the get_name method. The constructor calls the parent class'
constructor with a parameter for the number of cut points (i.e., 1). The
get_name method returns the name of the component for printing in the log file.
Now we need to look at the parent class for methods that it inherits. The
following shows the class definition for the n_pt class (located in the n_pt.h
file).
class n_pt {
public:
n_pt(int);
~n_pt(void) { delete cuts; }
char* get_name(void);
void cross(CHROM*, CHROM*, CHROM*, CHROM*);
protected:
int *cuts; //array of cut positions
int no_cuts; //number of cuts (n)
int total_cross; //total crosses to perform
int cur_cross; //crosses counter
virtual void pick_cuts(void); //pick cut point(s)
};
The cross method is the main procedure for implementing the crossover
technique. The cross method determines whether to perform crossover based on
the crossover probability. If crossover should be performed the pick_cuts
method is called to randomly select the desired number of cut points. Then the
cross method will pass this list of cut point(s) to the chromosome object to
actually exchange the desired segments from the parents to the offspring.
The class we created for uniform crossover needed a similar cross method but we
needed to pick the cut points differently. So we created a new class derived
from the n_pt class and overloaded the pick_cuts method. We also needed to
overload the get_name method so that it would return a name for the uniform
crossover method. The following shows the class definition and implementation
of the uniform_xo class (located in the unixo.h and unixo.C files).
class uniform_xo : public n_pt {
public:
uniform_xo(void) : n_pt(LCHROM-1){}
char* get_name(void);
protected:
void pick_cuts(void);
};
char* uniform_xo::get_name(void){
return( "Uniform Crossover" );
}
void uniform_xo::pick_cuts(void){
int i;
int cur_toss, prev_toss;
prev_toss = flip(SWAP_PROB);
no_cuts = 0;
for( i=1; i < LCHROM; i++ ){
cur_toss = flip(SWAP_PROB);
if(cur_toss != prev_toss){
cuts[no_cuts] = i;
no_cuts++;
prev_toss = cur_toss;
}
}
}
The constructor calls the n_pt constructor with the maximum number of cut
points allowed so that the data structures will be initialized correctly. The
get_name method just returns the name of the technique. And the pick_cuts
method determines whether each position on the chromosome string should be
swapped, according to the swapping probability. If the previous position
should be swapped and the current position should not (and vice versa) then
this is noted as a cut point. After the pick_cuts methods develops of the list
cut points the cross procedure (inherited from n_pt) will call a chromosome
method to actually exchange the segments between the parents and the offspring.
2. Update the Appropriate #define Identifier
After you create the program code for your new method, you will need to update
the file that contains the appropriate #define identifier. The Description of
#define Directives section above describes the purpose of each identifier and
the file where it is located. You will need to add a line to the file that
contains the appropriate identifier for the type of component you are adding.
A little later we will get more specific about the necessary modification, but
first you need to decide which #define identifier is appropriate for your new
component.
Returning to the uniform crossover example we have two possible #define
identifiers the might apply: CROSS_TYPE and CROSS. CROSS_TYPE is used to
specify which class the one_pt and two_pt classes will be derived from. The
CROSS identifier specifies which class is used for the crossover phase. Since
we have already specified that the uniform_xo class will be derived from the
n_pt class the CROSS_TYPE identifier is not applicable in this case. Moreover,
the uniform_xo class implements all the public interface features of a
technique for the crossover phase and thus it should be under the CROSS
identifier category.
The following shows part of the def_phase.h file, that contains the CROSS
#define identifier, and discusses the modifications that are require to update
this file.
// -- crossover
//#define CROSS uniform_xo
#define CROSS one_pt
//#define CROSS two_pt
// -- select
// ****** gen style
//#define SELECT sus_gen
//#define SELECT rank_sel
#define SELECT rank_sus_gen
// ****** individual style
//#define SELECT sus_ind
//#define SELECT rank_sus_ind
In the file that contains the appropriate #define identifier you will need to
add a line containing the #define preprocessor directive with the identifier
name and the class name for your new component. For example, "//#define CROSS
uniform_xo". The new line must be preceded by the comment symbol "//" (the new
style of C++ comments). This is necessary because the user interface program
changes which components are used by commenting out the #define directive of
the active component and then removing the comment symbol ("//") from the
#define directive for the component you selected. As you can see from the
section of the def_phase.h file shown above, only one definition for each
identifier will be active (i.e., not commented out) at any one time.
There are several reasons for listing all the appropriate values for each
identifier and keeping only one active by removing the comment symbol. First,
this is a convenient method to document the legal values for each identifier.
Second, this is an addition level of security to prevent inappropriate values
from being assigned to a #define identifier since the system must find an exact
match before it will remove the comment symbol.
3. Update Header Files
The next step is to place an #include preprocessor directive for the header
file of your new class in one of several composite header files. The composite
header files are files the contain nothing but #include directives for other
header files. The use of these composite header files simplifies the process
of making the appropriate section of the program code aware of and be able to
use your new component. So instead of having locate all the places in the
source code that need access to your new header file, you only need to add an
#include directive in one of the composite header files. And the composite
header files are already included in the appropriate places in the existing
source code.
Which composite header file you need to modify will depend on type of component
you are adding to the system. The following describes the type of components
that should be included in the different composite header files.
hdr_chr.h - this file includes header files for classes in the chromosome
hierarchy.
hdr_xo.h - this file includes header files for classes in the crossover
hierarchy.
hdr_gen.h - this file includes header files for components that can only be
used with generational reproduction GAs and that were not included in the
previous composite header files (i.e., hdr_chr.h or hdr_xo.h).
hdr_ind.h - this file includes header files for components that can only be
used in individual reproduction GAs and that were not included in the previous
composite header files.
hdr_both.h - this file includes header files for components that can be used
with either generational or individual reproduction GAs and that were not
included in the previous composite header files.
For example, we added the following line to the hdr_xo.h file for the uniform
crossover technique.
#include "unixo.h"
NOTE: You only need to add the #include directive for the header file of your
new component to one of the listed composite header files.
4. Update makefile
At this point you have made all the necessary modification to the source code.
Now its time to update the makefile so the make utility will be able to compile
your new component and link it with other modules to create an executable GA
program. There are several modifications you will need to make to the makefile
located in the Source sub-directory.
First, near the top of the makefile there are several variables defined with a
list of object files, you will need to add the object file for your new
component to one of these variables. The BOTH variable contains object files
that can be used with either generational or individuals reproduction GAs. The
GEN variable contains object files for components that can only be used with
generational reproduction GAs. And the IND variable contains object files for
components that can only be used with individual reproduction GAs. The
following shows the this portion of the makefile.
# object files for components that can be used with either
# generational or individual reproduction
BOTH = main.o chrom.o eval_chr.o cross.o n_pt.o rank_sel.o reduced.o \
can_mute.o stat.o populat.o unixo.o
# object files needed for Generational Reproduction GAs
GEN = repl_gen.o sus_gen.o rk_sus_gen.o pop_gen.o
# object files needed for Individual Reproduction GAs
IND = killer.o repl_rank.o repl_sus.o sus_ind.o repl_rk_sus.o \
rk_sus_ind.o pop_ind.o
The object file for uniform crossover (unixo.o) was added to the BOTH variable
since it can be used with either type of GA.
The second modification to the makefile is to add the dependency information
for the object file of your component and add the commands for updating the
object file if the dependents have been modified since the object file was
created. The following shows the section of the makefile relating to the
crossover object files.
#----- Crossover Classes - implement crossover techniques
#
# cross.* implements 1 and 2 point crossover
# n_pt.* implements Traditional crossover (where cut points
# randomly selected).
# reduced.* implements Reduced Surrogate crossover
# unixo.* implements Uniform crossover
cross.o : cross.C hdr_xo.h extern.h
CC +i -c -g cross.C
rm *..c
n_pt.o : n_pt.C n_pt.h $(HDRS)
CC +i -c -g n_pt.C
rm *..c
reduced.o : reduced.C reduced.h $(HDRS)
CC +i -c -g reduced.C
rm *..c
unixo.o : unixo.C hdr_xo.h extern.h
CC +i -c -g unixo.C
rm *..c
For example, the uniform crossover object file is depended on several header
files and the source code file that implements the methods defined for this
class. The lines below the dependency information are the commands necessary
to update date the object file.
And the final modification to the makefile is to add a new dependent for the
composite header file where you added the #include directive for your
component's header file in Step 3 above. The following shows the section of
the makefile for the hdr_xo.h file where we added the #include directive for
the header file of the uniform crossover technique.
hdr_xo.h : n_pt.h reduced.h cross.h unixo.h
touch hdr_xo.h
Here we only need to modify the time and date of the file (by the touch
command) to update the composite header file. This causes the object files
that have this file as a dependent to be updated since the hdr_xo.h file is now
more currently than the object files.
5. Create a Help File
Now you need to create a help file for your new component. The information in
this file will be displayed when the user selects your component and presses
the F1 key while in the Select GA Components section of the user interface
program.
All of the help files are located in the Help sub-directory under the main
directory where the GA Toolkit was installed. You will need to go to this
directory to create the help file. In the Help directory there is a file
called HELP.TEMP. This provides a template for creating your help file. To
create your help file you will need to make a copy the HELP.TEMP file. Then
you will need to edit your new file with a text editor and add the information
about your new component.
6. Update Initialization File for the User Interface Program
Now comes the fun part! For you to be able to choose your new component
through the user interface program you need to update an initialization file
the system reads about the available components. By placing this information
in an input file, rather than hard-coding it, it is easier to update the user
interface program when new components are added. This initialization file
(called gat.ini in the Uif sub-directory) contains information about how to
display the components and the constraints on which component can be used
together.
The first line of the file contains a number for the number of entries in the
file. The user interface program uses an array data structure to store and
manipulate the information about each component. Knowing the number of entries
in the file makes it easier to create this data structure. You will need to
increment this number for each entry you add to this file.
The entries about each components contain several fields. Each field, with the
exception of Constraints, must be delimited by commas. The following describes
each field. An actual portion of the gat.ini file will be presented and
explained later.
Menu Category - this is the name to display on the menu of components (e.g.,
Chromosome or Selection).
Sub-Menu - this indicates whether there is a sub-menu for the category and the
name of the sub-category (e.g., Representation and Decoding are sub-categories
for Chromosome). Use "None" in this field if there is no sub-category.
Name of Component - this is name that will be displayed for the component
(e.g., GENESIS Rank and Linear Rank components in the Selection category).
#define Identifier - this is the #define identifier associated with the
component (e.g., CROSS is the #define identifier for Uniform Crossover).
Value for #define - this the value to associate with the #define identifier
when the component is selected (e.g., we would give CROSS the value uniform_xo
when Uniform Crossover is selected).
Help file - this is the name of the file (in the Help sub-directory) that
contains information about the component to displayed when the user presses the
F1 key.
Number of Constraints - this is the number of lines that follow describing the
constraints about which components can be used with this technique.
Constraints - each constraint is on a separate line following the entry for the
component. Each constraint line must begin with a tab character. Constraints
have the following format:
identifier operator value
The system currently understands the four operators described below. Examples
of these operators will be discussed later.
= This is the equal operator. It means that the current value associated with
the #define identifier must be equal to the value specified in the constrain
for this component to be compatible with the current configuration.
!= This is the not equal operator. It means that as long as the current value
associated with the #define identifier is not equal to the value specified in
the constraint then this component is compatible with the current
configuration.
> This is the implies operator. It means that if this component is selected
and all the other constraints are satisfied then the component designated by
the #define identifier and value of the constraint must also be selected. For
example, if you are using SUS selection with individual reproduction, the
repl_sus class must be used in the replacement phase.
?= This is the if operator. It means that if the identifier is currently
associated with the value in the constraint then evaluate the constraints in
the following lines preceded by two tab characters.
The follow shows portions of the gat.ini file. The actual contents of the file
are print in a Courier font and additional comments explaining the entries are
printed in a Times Roman font.
The first three entries in the file are for the chromosome category. The first
entry is for the Representation sub-category. The name of the component is 1-D
Bit String (the third field). When this component is selected the system will
assign the REP_CHROM identifier the value of bit_chrom (the fourth and fifth
fields). The bit_str.hlp file in the Help sub-directory contains additional
information the user may access via the F1 key. And the last field contains a
zero indicating that there are no constraints for this component.
24
Chromosome,Representation,1-D Bit String,REP_CHROM,bit_chrom,bit_str.hlp,0
Chromosome,Decoding,Binary,CODE_CHROM,bit_chrom,binary.hlp,0
Chromosome,Decoding,Gray,CODE_CHROM,gray_chrom,gray.hlp,0
The entry below is for the Individual Reproduction. This is probably the most
complicated entry because of the various constraints. But this is a good
example to explain how the constraints work. All of the constraints in this
case use the if operator. The system will evaluate each constraint with the if
operator and if the condition is true it will evaluate the constraints below it
that are precede by two tab characters. So the first two constraints mean that
when the user selects individual reproduction if we are currently using the
generational version of SUS (sus_gen) or GENESIS Rank (rank_sus_gen),
respectively, that we should switch to the individual reproduction version of
the selection technique. The third constraint has two if conditions. It means
the if Linear Rank selection (rank_sel) and the Generational Emulation
replacement policy (kill_term) are currently selected the we need to change the
replacement policy to the Remove Worst technique (kill_worst) since
Generational Emulation is not compatible with Linear Rank selection. And the
final constraint is a little confusing. It says the if Linear Rank selection
is being used then we need to change the selection component to Linear Rank.
Now this seem silly. But the implies constraint (>) actually causes the
program to evaluate the constraints for the component we what to change. So
the program will evaluate the constraints for Linear Rank selection. One of
these constraints is that if individual reproduction is being used then we also
need the repl_rank component for the replacement phase which inserts new
offspring into the population it the proper order according to their rank.
Reproduction,None,Individual,POP_TYPE,pop_ind,ind.hlp,9,
SELECT ?= sus_gen
SELECT > sus_ind
SELECT ?= rank_sus_gen
SELECT > rank_sus_ind
SELECT ?= rank_sel
KILLER ?= kill_term
KILLER > kill_worst
SELECT ?= rank_sel
SELECT > rank_sel
The entries below are most of the entries for the Selection category. The
first entry is for the generational version of SUS selection. The only
constraint on this component is that the style for reproduction must be
generational (pop_gen). The last entry is for Linear Rank selection with
individual reproduction. The constraint says we must be using individual
reproduction for this component to be compatible. The second constraint says
the we can use Linear Rank selection as long as we are not using the
Generational Emulation replacement policy (kill_term). And the final
constraint says the to use this component we must also use the repl_rank class
for the replacement phase so that the offspring will be inserted in the proper
position within the population, according to their rank.
Selection,None,SUS,SELECT,sus_gen,sus.hlp,1,
POP_TYPE = pop_gen
Selection,None,SUS,SELECT,sus_ind,sus.hlp,2,
POP_TYPE = pop_ind
REPLACE > repl_sus
Selection,None,Linear Rank,SELECT,rank_sel,lrank.hlp,1,
POP_TYPE = pop_gen
Selection,None,Linear Rank,SELECT,rank_sel,lrank.hlp,3,
POP_TYPE = pop_ind
KILLER != kill_term
REPLACE > repl_rank
The entries below are for the Crossover category. The first entry is the item
we added for the Uniform crossover technique. In addition, we needed to add
several constraints to the next two entries (Traditional and Reduced Surrogate
crossover, respectively). These constraints mean the if we select either
Traditional or Reduced Surrogate crossover when Uniform crossover is the
component that we are currently using, then we need to change the CROSS
identifier to one_pt. That is, it is invalid for the CROSS_TYPE identifier to
be associated with either n_pt (Traditional) or reduced_sur (Reduced Surrogate)
when the CROSS identifier is set to uniform_xo (Uniform crossover). Since the
user has selected either Traditional or Reduced Surrogate crossover we need to
change the CROSS identifier to either one_pt or two_pt and by default the
system will use one_pt.
Crossover,Style,Uniform XO,CROSS,uniform_xo,unif_xo.hlp,0
Crossover,Style,Traditional,CROSS_TYPE,n_pt,trad_xo.hlp,2,
CROSS ?= uniform_xo
CROSS > one_pt
Crossover,Style,Reduced Surrogate,CROSS_TYPE,reduced_sur,surr_xo.hlp,2,
CROSS ?= uniform_xo
CROSS > one_pt
Crossover,# Cut Points,One,CROSS,one_pt,none,0
Crossover,# Cut Points,Two,CROSS,two_pt,none,0
So to be able to select your new component through the user interface program
you will need to add an entry to the gat.ini file. When you add your entry you
should place it in the file with the other entries for that type of technique.
For example if your created a new selection technique you should add the new
entry in the area of the gat.ini file the has other entries for the selection
category. In addition, you may need to modify the constraints for some the
existing entries to ensure compatibility with your new component. And finally,
do not forget to update the number on the first line of gat.ini file to
accurately represent the number of component entries in the file.
After you have modified the gat.ini file you can execute the user interface
program and you should be allow to select your new component. There is no need
to re-compile the user interface program since gat.ini is only an input file.
{tc "Adding a New Parameter" \L 2}Adding a New Parameter
This section describes how you can add a new input parameter for the GA
programs that you create with the GA Toolkit. This is especially useful if you
create a new component that needs a new parameter. For example, for the
Uniform crossover technique that was recently added to the system needed a
parameter to specify the probability for swapping each position on the
chromosome. The following describes the process for adding a new parameter.
1. Declaring a Variable for the New Parameter
The first step is to declare a new variable for your parameter. You will need
to modify two files to declare your new variable and to provide the component
access to your parameter. All user-defined input parameters for the GA Toolkit
are declared as global variables so that all component will have access to the
parameters they require.
The global.h file in the Source sub-directory contains all the declaration of
global variables. You will need to add a declaration for your new variable in
this file. For example, we added the following for the swapping probability
parameter.
double SWAP_PROB;
The second file you need to modify is extern.h (in the Source sub-directory).
This file contains all the external declaration for the global variables. This
is necessary so that all the components will have access to the proper
variables. For example, we added the following external declaration for the
swapping probability parameter.
extern double SWAP_PROB;
2. Update the Input Section
In this step you need to modify the input.h file (in the Source sub-directory).
This file contains several #define directives used for reading the input
parameters from the in file. The INPUT_FORM identifier is used for the fscanf
format string. And the INPUT_VARS identifier is used as the list of input
parameter variables in the fscanf call. So by updating the #define identifies
in the input.h file the program will be able to read in your new parameter.
The following shows a section of the input.h file where we updated the
INPUT_FORM and INPUT_VARS #define identifiers.
#define INPUT_FORM " \
Total Experiments = %d \
Total Trials = %d \
Population Size = %d \
DeJong Gap = %lf \
Elite Gap = %lf \
Prob. Cross = %lf \
Prob. Mutation = %lf \
Selection Bias = %lf \
Replacement Bias = %lf \
Maximum Spin = %d \
Statistics Interval = %d \
Hamming Metrics = %d \
Best Size = %d \
Worst Size = %d \
Maximize = %d \
Evaluate All = %d \
Seed Population = %d \
Debugging Level = %d \
Display = %d \
Print Generations = %d \
Random Seed = %lu \
Swapping Prob. = %lf"
#define INPUT_VARS &MAX_EXPER, &MAX_TRIALS, &POP_SIZE, &GAP, \
&ELITE_GAP, &P_CROSS, &P_MUTE, &SELECT_BIAS, &REPLACE_BIAS, \
&MAX_SPIN, &STATS_GAP, &HAM_FLAG, &BEST_SIZE, &WORST_SIZE, &MAX_FLAG, \
&EVAL_ALL, &INIT_FLAG, &DEBUG_FLAG, &DISPLAY_FLAG, &PRINT_GEN, &SEED, \
&SWAP_PROB
As you can see we just added our new "Swapping Prob." parameter to the end of
the INPUT_FORM identifier and the variable SWAP_PROB to the end of the
INPUT_VARS identifier. Note: The INPUT_VARS identifier is used as the list of
argument to the fscanf function so the "&" in front of the variable name is
necessary so the value will be actually be store in the variable after the
fscanf function returns.
3. Update the Help File for Input Parameters
The next step is to add a short description about your new parameter to the
in.hlp file (in the Help sub-directory). This file contains information about
all the input parameters. You should use a text editor to edit the in.hlp file
and add some information about your new parameter.
4. Update the Parameter Initialization File for the User Interface Program
The final step for adding a new parameter is to update an initialization file,
called gat_in.ini in the Uif sub-directory, so that you can create input files
with this new parameter via the user interface program. The gat_in.ini file
contains information about the parameters so the user interface program can
ensure that only appropriate values are entered for each parameter.
The first line in the gat_in.ini file contains the number parameter entries in
the file. The user interface program uses an array data structure for storing
and manipulation the parameter information. The number on the first line of
the file makes it easier to create this data structure. When you add
information to this file for new parameters you will need to update this number
to accurately reflect the number of entries in the file.
Each remaining lines in the file contains information about one parameter.
Each line or entry contains several comma delimited fields described below.
The actual gat_in.ini file will be presented and explained later.
Parameter Name - this is a name for the parameter. You must use exactly the
same name you used in the INPUT_FORM identifier in the input.h file. For
example, for the swapping probability parameter we used the name "Swapping
Prob." so that is the value we must use for this field.
Default Value - this the default value for the parameter that will initially be
displayed in the user interface program where the user begins creating the in
file.
Type of Parameter - this field is used so that the validity of the user's input
can be checked. The system currently understands the following entries for
this field.
1 the value for this parameter must be greater than the value of the Minimum
field.
2 the value for this parameter must be greater than or equal to the value of
the Minimum field.
3 the value for this parameter must be an even integer.
4 the value for this parameter must be in the range [Minimum field, Maximum
field].
5 the value for this parameter must be in the range (Minimum field, Maximum
field].
6 the value for this parameter can be 0 or 1. This is for boolean flags. A
value of 0 is displayed as "no" and a 1 is displayed as "yes" in the user
interface program. The default value for this field must be 1 or 0 (not yes or
no).
Minimum - this field is only used with parameters of type 1, 2, 4 or 5.
Maximum - this field is only used with parameters of type 4 or 5.
The following shows the actual gat_in.ini file and explains several of the
entries. The contents of the file is printed in a Courier font and additional
comments are printed in a Times Roman font.
22
Total Experiments,1,1,0
The Total Trials parameter has a default value of 1000. This is a type 1
parameter and the Minimum field is 0 which means that the value for this
parameter must be greater than 0.
Total Trials,1000,1,0
Population Size,50,3
DeJong Gap,1.0,4,0.0,1.0
Elite Gap,1.0,4,0.0,1.0
The Prob. Cross entry is for the cross probability parameter. It has a default
value of 1.0. This is a type 4 parameter with a minimum of 0.0 and maximum of
1.0. So the values for this parameter must be in the following range [0.0,
1.0].
Prob. Cross,1.0,4,0.0,1.0
Prob. Mutation,0.001,4,0.0,1.0
Selection Bias,1.5,5,1.0,2.0
Replacement Bias,1.5,5,1.0,2.0
Maximum Spin,3,1,0
Statistics Interval,200,1,0
Hamming Metrics,0,6
Best Size,1,2,0
Worst Size,3,2,0
The Maximize parameter is boolean flag (type 6) parameter for whether the GA
will be applied to a maximization or minimization problem. The default value
is 1 (the flag is on) indicating a maximization problem.
Maximize,1,6
Evaluate All,0,6
Seed Population,0,6
Debugging Level,0,2,0
Display,0,6
Print Generations,0,6
Random Seed,123456789,2,0
The Swapping Prob. entry is what we added for the swapping probability
parameter for the Uniform crossover technique. The default value is 0.5 or on
average half the positions on the chromosome will be swapped. This a type 4
parameter with a minimum of 0.0 and a maximum of 1.0. So this parameter can be
assigned values in the following range [0.0, 1.0].
Swapping Prob.,0.5,4,0.0,1.0
When you add an entry for your new parameter it must be in the same relative
position as you added the information in the input.h file. For example, we
added the Swapping Prob. parameter to the end of the INPUT_FORM and INPUT_VARS
#define identifiers in the input.h file so we must also add our entry for this
parameter to the end of the list in the gat_in.ini file. You can arrange the
parameters in
any order you like also long as you maintain the same order in the gat_in.ini
file and in the INPUT_FORM #define identifier (with corresponding variables in
the INPUT_VARS #define identifier).
After you have completed the necessary modification to the gat_in.ini file you
can execute the user interface program and create an in input file with your
new parameter.
{tc "REFERENCES" \l 1}References
Ackley, D. H. A Connectionist Machine for Genetic Hillclimbing. Boston, MA:
Kluwer Academic Publishers, 1987.
Baker, J. E. "Adaptive Selection Methods for Genetic Algorithms." In
Proceedings of the First International Conference on Genetic Algorithms and
Their Applications, ed. J. J. Grefenstette, 101-111. Hillsdale, NJ: Lawrence
Erlbaum Associates, 1985.
Baker, J. E. "Reducing Bias and Inefficiency in the Selection Algorithm." In
Genetic Algorithms and Their Applications: Proceedings of the Second
International Conference on Genetic Algorithms, ed. J. J. Grefenstette, 14-21.
Hillsdale, NJ: Lawrence Erlbaum Associates, 1987.
Booker, L. "Improving Search in Genetic Algorithms." In Genetic Algorithms
and Simulated Annealing, ed. Lawrence Davis, 61-73. Los Altos, CA: Morgan
Kaufmann Publishers, Inc., 1987.
DeJong, K. A. "An Analysis of the Behavior of a Class of Genetic Adaptive
Systems." Ph.D. dissertation, University of Michigan, 1975.
Frederick, W., R. Sedlmeyer and C. White. "The Hamming Metric in Genetic
Algorithms and Its Application to Two Network Problems." Computer Science
Department, Indiana University-Purdue University at Fort Wayne, (unpublished
manuscript), 1991.
Goldberg, D. E. Genetic Algorithms in Search, Optimization, and Machine
Learning. Reading, MA: Addison-Wesley Publishing Company, Inc., 1989.
Grefenstette, J. J. GENESIS Software (Version 5.0). Distributed by The
Software Partnership, Melrose, MA, 1990.
Holland, J. H. Adaptation in Natural and Artificial Systems. Ann Arbor, MI:
The University of Michigan Press, 1975.
Lienau, S. A. Effective Investigation of Genetic Algorithms. Master's Thesis,
University of Missouri-Kansas City, 1992.
Syswerda, G. "Uniform Crossover in Genetic Algorithms." In Proceedings of the
Third International Conference on Genetic Algorithms, ed. J. D. Schaffer, 2-9.
San Mateo, CA: Morgan Kaufmann Publishers, Inc., 1989.
Whitley, D. "The GENITOR Algorithm and Selection Pressure: Why Rank-Based
Allocation of Reproductive Trials is Best." In Proceedings of the Third
International Conference on Genetic Algorithms, ed. J. D. Schaffer, 116-121.
San Mateo, CA: Morgan Kaufmann Publishers, Inc., 1989.
Whitley, D. "Fundamental Principles of Deception in Genetic Search." In
Foundations of Genetic Algorithms, ed. G. J. E. Rawlins, 221-241. San Mateo,
CA: Morgan Kaufmann Publishers, Inc., 1991.
{PAGE|33}
{PAGE|i}
{PAGE|55}
{PAGE|57}
{PAGE|56}